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Wireless Communications in the Era of Big Data 

Suzhi Bi, Rui Zhang, Zhi Ding, and Shuguang Cui 

Abstract 

The rapidly growing wave of wireless data service is pushing against the boundary of our commu¬ 
nication network’s processing power. The pervasive and exponentially increasing data traffic present 
imminent challenges to all the aspects of the wireless system design, such as spectrum efficiency, 
computing capabilities and fronthaul/backhaul link capacity. In this article, we discuss the challenges 
and opportunities in the design of scalable wireless systems to embrace such a “bigdata” era. On one 
hand, we review the state-of-the-art networking architectures and signal processing techniques adaptable 
for managing the bigdata traffic in wireless networks. On the other hand, instead of viewing mobile 
bigdata as a unwanted burden, we introduce methods to capitalize from the vast data traffic, for building 
a bigdata-aware wireless network with better wireless service quality and new mobile applications. We 
highlight several promising future research directions for wireless communications in the mobile bigdata 
era. 


I. Introduction 

Decades of exponential growth in commercial data services has ushered in the so-called “bigdata” era, 
to which the expansive mobile wireless network is a critical data contributor. As of 2014, the global 
penetration of mobile subscribers has reached 97%, producing staggeringly 10.7 ExaBytes (10.7 x 10^®) 
of mobile data worldwide. The surge of mobile data traffic in recent years is mainly attributed to the 
popularity of smartphones, phone cameras, mobile tablets and other smart mobile devices that support 
mobile broadband applications, e.g., online music, video and gaming as shown in Fig. [U With a compound 
annual growth rate of over 40%, it is expected that the mobile data traffic will increase by 5 times from 
2015 to 2020. 
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Fig. 1. Some example sources of wireless bigdata traffic. 


In addition to the vast amount of wireless souree data, modem wireless signal proeessing often amplifies 
the system’s pressure from bigdata in pursuit of higher performanee gain. For instanee, MIMO antenna 
teehnologies are now extensively used to boost throughput and reliability at both mobile terminals (MTs) 
and base stations (BSs) of high speed wireless serviees. This, however, also inereases the system data 
traffie to be proeessed in proportion to the number of antennas in use. Moreover, the 5G (the fifth 
generation) wireless network presently under development is likely to migrate the eurrently hierarehieal, 
BS-eentrie eellular arehitecture to a eloud-based layered network strueture, eonsisting of a large number 
of cooperating wireless access points (APs) connected by either wireline or wireless fronthaul links to 
a bigdata capable processing central unit (CU). New wireless access structures, such as coordinated 
multipoint (CoMP or networked MIMO) [[0, heterogeneous network (HetNet) |l2l and cloud-based radio 
access network (C-RAN) |[3l, are under development to achieve multi-standard, interference-aware and 
energy-friendly (green) wireless communications. In practice, the use of cooperating wireless APs could 
easily generate multiple Gbps data from a single user’s fronthaul links due to the need for baseband joint 
processing, such that the high traffic load may overwhelm the fronthaul link or the system computing 
unit for signal processing and coordination. Such intensely high system traffic volume, together with the 
rapidly growing mobile data source volume, surpasses both the processing power improvement speed of 
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our current computing capabilities and the fronthaul/backhaul link rate increase pace of our networking 
systems. It necessitates a new wireless architecture along with efficient signal processing methods to 
make wireless systems scalable to continued growth of data traffic. 

On the other hand, timely and cost-efficient information processing is made possible by the fact that the 
vast-volume mobile data traffics are not completely chaotic and hopelessly beyond management. Rather, 
they often exhibit strong insightful features, such as user mobility pattern, spatial, temporal and social 
correlations of data contents. These special characteristics of mobile traffic present us with opportunities 
to harness and exploit bigdata for potential performance gains in various wireless services. To effectively 
utilize and exploit these characteristics, they should be identified, extracted, and efficiently stored. For 
instance, caching popular contents at wireless hot spots could effectively reduce the real-time traffic in the 
fronthaul links. Additionally, network control decisions, such as routing, resource allocation, and status 
reporting, instead of being rigidly programmed, could be made data-driven to fully capture the interplay 
between bigdata and network structure. Presently, however, these advanced data-aware features could not 
be efficiently implemented in current wireless systems, which are mainly designed for content delivery, 
instead of analyzing and making use of the data traffic. 

Bearing in mind of the aforementioned challenges and opportunities brought by bigdata traffic, we 
address in this article two important problems of wireless communication system design in the bigdata 
era: 

Ql: What may constitute a scalable wireless network architecture for efficient handling of bigdata traffic? 
Q2: How to effectively incorporate and utilize the bigdata awareness to improve the wireless system 
performance? 

Specifically, to answer Ql, we introduce in Section |n] a hybrid signal processing paradigm to enable 
flexible data processing at both the BS/AP and the CU levels, and correspondingly a number of scalable 
data traffic management techniques to serve the conflicting needs between the overall system performance 
and the data processing complexity. For Q2, we first discuss in Section |III] typical bigdata features and 
efficient data analytics to extract these features. Next, we introduce a number of bigdata-aware signal 
processing methods and wireless networking structures to capitalize from bigdata interplay, such as mobile 
cloud processing, crowd computing, and software-defined networking, etc. We also suggest in Section HVl 
several future research directions for wireless communications in the bigdata era. Finally, we conclude 
this article in Section |Vl 

Before proceeding to detailed discussions, it is worth mentioning that the considered scalable network 
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Structure and bigdata awareness are both important mechanisms for accommodating mobile bigdata in 
future wireless networks, though they focus differently on the physical and network/application layers, 
respectively. Nonetheless, the two solutions can also be complementary to each other. For instance, 
as we will discuss later, we can optimize the overall caching strategy by combining long-term cache 
provisioning (network/application layer) and real-time cache-assisted signal processing (physical layer) 
techniques. In addition, although this article focuses on the design aspects of cellular networks in the 
bigdata era, most of the key enabling mechanisms for mobile bigdata processing are also applicable to 
other wireless networking structures, such as wireless local area networks (WLANs) and heterogeneous 
networks. Some representative system designs are also discussed in this article. 

II. Scalable Wireless Bigdata Traeeic Management 
A. A hybrid network structure 

Neither the current cellular systems nor the next-generation cloud-based C-RAN [|3l under development 
was designed to provide a scalable solution for the arrival of the bigdata era. The current 3G and 4G 
cellular systems exemplify a BS-centric design, in which a BS bears much the responsibilities of radio 
access, baseband processing and radio resource control execution to serve the mobile users in the vicinity. 
To meet the fast growing mobile data service demand, a smaller cell size is commonly used to improve 
frequency reuse, which may generate complex and severe inter-cell interference. Furthermore, small cells 
can also be costly because of cost from densely deployed BSs. The cloud-centric network proposed for 5G 
mitigates the inter-cell interference by centralized signal processing and reduces the unit cell deployment 
cost by moving computations to the “cloud”. At the same time, only inexpensive relay-like remote radio 
heads (RRHs) are used for radio frequency (RF) level wireless access. However, such fully centralized 
scheme may be overwhelmed by the huge wave of data traffic beyond its fronthaul link capacities and 
its computational power. 

Alternatively, a hybrid structure could take advantage of the benefits from the two design paradigms; 
that is, a wireless system that could adaptively choose only local processing at the BS-level, or only central 
processing at the CU-level, or parallel processing at both levels, based on, for instance, physical channel 
conditions and correlations in the data contents, etc. We thus consider such a generic network structure 
shown in Fig. |2l which mainly inherits the skeleton of C-RAN, but has integrated several programmable 
modules to carry out intelligent signal processing at the BS level. In the radio access network, mobile 
users could be served simultaneously by multiple BSs, where each BS is equipped with multiple antennas 
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Fig. 2. A hybrid CU-BS processing network structure. 


and is linked to the CU via high-speed fiber/wireless fronthauls for exehanging user data and eontrol 
signals. The CU is further eonneeted to the baekhaul eore network for external eontent aeeess. In the 
proposed hybrid network strueture, baseband proeessing units (BPUs) are available at both the BSs and 
CU, whieh enable user message eneoding/deeoding at both levels. In addition, learning units (LUs) are 
installed for data traffie analyties, whose funetions will be detailed in Seetion|nIl Caehes are also installed 
at the BSs and CU to save the fronthaul bandwidth eonsumed for frequent retransmissions of popular 
eontents. 

Before entering the diseussions of hybrid signal proeessing models, it is worth mentioning that ap- 
plieable fronthaul data management methods are direetly eonstrained by fronthaul teehnologies in use. 
Speeifieally, the system eould ehoose between optieal analog and optieal/wireless digital fronthaul teeh- 
nologies. Optieal analog modulation using radio frequeney (RF) signal as input is eommonly referred 
to as the radio-over-fiber (RoF). Alternatively, analog RF input signal eould also be quantized and 
eneoded into binary eodewords for digital wireless or fiber-optie eommunieation (DFC). In praetiee, 
RoF is simpler and less expensive than DFC. Furthermore, it also exhibits lower proeessing delays and 
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better interoperability with multiple wireless standards, e.g., 3G, LTE and WiFi, as it is oblivious to 
the user eodebooks and wireless modulation sehemes. However, its limitations are also evident, e.g., 
suseeptibility to noise and signal distortion, and diffieulty of synehronization. More importantly, available 
signal proeessing teehniques for fronthaul traffic management using RoF are less sophisticated, generally 
limited to simply transforming, or removing certain parts of the received RF signals, e.g., sub-channel 
and antenna selection methods. In contrast, DFC could be combined with data compression, opportunistic 
decoding and many other advanced digital signal processing techniques. In the following, we mainly focus 
on data traffic management methods using digital fronthaul. 

B. Hybrid signal processing models 

The wireless/fiber-optic link has its own throughput limit. For instance, a commercial fiber-optic link 
normally operates at a link rate in the order of 10 Gbps for digital communication over a single optical 
carrier. Transmission rates beyond the link rate capacity may lead to severe signal distortions, and 
consequently poor decoding performance. Therefore, the system performance must be optimized under 
the fronthaul link capacity constraints. With respect to the hybrid network structure in Fig. [2l we now 
introduce some scalable fronthaul data management techniques in three major categories: 

1) Data compression: Uplink direction would require unlimited fronthaul capacity to transmit an 
analog RF signal perfectly without any distortion from a BS to the CU. An analog signal could be more 
efficiently transmitted through the fronthaul if it is quantized and compressed into binary codewords. 
From an information theoretic perspective, the effect of data compression could be modeled as a test 
channel (often Gaussian for simplicity of analysis) for which uncompressed signals as the input and 
compressed signals are the output. The compression design is equivalent to setting the variance of the 
additive compression noise (H. To achieve successful compression, the encoder needs to transmit to the 
decoder at a rate at least equal to the mutual information between the input and the output over the 
Gaussian test channel. Intuitively, a tighter fronthaul capacity constraint would therefore require a more 
“coarse compression” with a larger compression noise. Existing compression designs in general take the 
following approaches. 

. Joint compression across different BSs: When multiple BSs compress and forward their received 
signals to the CU in uplink, the compression design requires setting the covariance of the compression 
noises across different BSs. A common objective is to maximize the information rate under the 
fronthaul capacity constraints. In this setting, distributed Wyner-Ziv lossy compression can be used 
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at the BSs, exploiting signal correlation across the multiple BSs BUl. The distributed Wyner-Ziv 
compression scheme is shown to yield significant capacity gains over independent quantization 
methods especially in the low backhaul capacity region [|4l|. Similar data compression methods 
could also be applied in downlink. Interestingly, it has been shown in (Si that downlink compression 
and multi-user precoding design (for interference mitigation) could be designed separately without 
compromising maximum system throughput, which is achieved by an optimal but much more 
complicated joint compression-precoding design. 

• Independent BS-level compression: The practical implementation of distributed Wyner-Ziv compres¬ 
sion is difficult mainly because of the high complexity in determining the optimal joint compression 
codebook and the joint decompressing/decoding at the CU. Accordingly, independent compression 
methods, where the quantization codebook at a BS is only determined by its local signal-to-noise ratio 
(SNR), can be used to reduce the computational complexity and the signaling exchange overhead in 
the fronthaul. 

• Uniform scalar quantization: Even when using independent BS-level compression, real-time com¬ 
putation and exchange of quantization codebooks using the information-theoretical source coding 
approaches are often difficult to realize in practice. Instead, simple uniform scalar quantization 
methods compatible with A/D modules are proposed to reduce the implementation cost [|5l|. Inter¬ 
estingly, it is shown in |I51 that the achievable rate using simple uniform scalar quantization in fact 
performs closely to that of the Gaussian test channel model. This indicates that efficient fronthaul 
capacity usage is achievable in practical systems with simple quantization methods. 

2) BS-level encoding/decoding: Besides acting as relays to compress/decompress and forward the 
user signals, BSs with advanced baseband processing capabilities could also encode/decode the received 
messages to further improve the system performance under stringent fronthaul capacity constraints. 

• Partial cooperation: In uplink, one direct method to reduce fronthaul traffic is to limit the number 
of cooperating elements when serving mobile users. Many sparsity inducing optimization methods 
could be applied to satisfy a certain quality of service level using minimum numbers of sub-channels, 
antennas or cooperating BSs. In downlink, similar sparse precoding methods could be studied to 
optimize precoders by jointly maximizing the user utilities (e.g., data rate) and minimizing the total 
number of data streams in the fronthaul [l6l|. 

• Distributed encoding/decoding: Distributed decoding allows the BSs to decode user messages locally 
without forwarding quantized signals to the CU. For instance, llH considers a rate-splitting approach 
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Fig. 3. Throughput performance comparison of three network structures (from left to right): BS-centric, cloud-centric, and hybrid processing 
networks. The common-throughput (C-Thr) achieved by the three networks from (a) to (c) are 210, 230, and 301 Mbps, respectively. 


to divide an MT’s message into two parts, where one part is deeoded locally by the serving BS 
and the remainder is compressed and jointly decoded by the CU. In another case, [|3 proposes an 
opportunistic hybrid decoding method, where a user’s message is either decoded locally at a BS 
when its SNR is sufficiently high, or jointly decoded by the CU based on signals forwarded from 
a subset of cooperating BSs when the SNR at each individual BS is too low. Note that the locally 
decoded user messages can be used to cancel their interferences to the received RF signals at the 
BSs, which can effectively reduce the amount of data transmitted to the CU over the fronthaul links. 
In the downlink case, BSs could encode and modulate the baseband symbols to RF signals before 
transmitting them to the MTs. Therefore, instead of transmitting complete signal waveforms (or 
waveform samples) to the BSs, CU could save fronthaul bandwidth by transmitting separately the 
information symbols and the beamforming vectors, while leaving RF modulation to the BSs. 

To show the performance advantage of the hybrid signal processing model, we present a numerical 
example in Fig. [3] to compare the throughput performance among the BS-centric, the cloud-centric, and 
the hybrid processing networks. Let us consider a cellular uplink, where 3 MTs transmit over orthogonal 
sub-channels, each with 100 MHz bandwidth. Besides, each fronthaul link has 1.2 Gbps capacity. The 
decoding methods of the three networks are described as follows. 

• BS-centric network: BSi decodes the messages from MTi and MT 2 , and BS 2 decodes the message 
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from MT3. Then both the BSs send the deeoded messages to the CU; 

• Cloud-eentrie network: both BSs eompress the reeeived signals using the sealar quantization method 
eonsidered in [[5]|. They then forward the eompressed signals to the CU for joint deeoding. In 
partieular, each user is equally allocated 400 Mbps fronthaul bandwidth at a BS to transmit its 
compressed signal; 

• Hybrid processing network: BSi and BS 2 first decode the messages from MTi and MT3, respectively, 
before transmitting the decoded messages to the CU. Meanwhile, each BS uses the remaining 
fronthaul bandwidth to compress and transmit the signal from MT 2 to the CU for the joint decoding 
of MT 2 ’s message. 

From the aforementioned network setups, we calculate in Fig. |3] the achievable user data rates under a 
random channel realization, and compare the common-throughput performance (the minimum data rate 
among the three users) in different cases. We can see that the BS-centric network achieves the lowest 
common-throughput, owing to the low data rate of the cell-edge user MT 2 , which is only 210 Mbps. The 
cloud-centric network slightly improves the data rate of MT 2 and hence the common-throughput to 230 
Mbps, thanks to its joint processing gain. However, the data rates of MTi and MT 3 are severely degraded, 
since the limited fronthaul capacity introduces high compression noises to the useful signals. The hybrid 
processing network achieves the highest common-throughput (301 Mbps) among the three schemes that 
we considered, which is 43% and 31% higher than those of the BS-centric and cloud-centric networks, 
respectively. Compared to the cloud-centric networks, by decoding the messages from MTi and MT 3 at 
the BS-level, the hybrid processing network has a larger fronthaul bandwidth to spare for transmitting 
MT 2 ’s signals to the CU with more refined compression, thus achieving a higher joint processing gain. 

3) Cache-assisted processing: In downlink transmission, caching at the BSs is cost-effective to reduce 
real-time traffic on fronthaul, thereby enabling significant improvement on the overall C-RAN perfor¬ 
mance. Cache-assisted wireless resource allocation is a cross-layer approach that incorporates the status 
of application-layer data flow in wireless physical-layer design. As an illustrative example in Fig. HI BS 2 
serves two requests from the two MTs, whereas caches of the other two BSs are empty. Although MTi 
is closer to BSi with a better wireless channel condition, the maximum downlink data rate is only 1 unit 
per second if BSi is selected to transmit directly, due to the constraint of link congestion between BSi 
and the CU. Instead, the CU could select BS 2 to send the cached contents to MTi at a rate of 2 units per 
second, whose end-to-end data rate is not constrained by the congestion level of the CU-to-BS 2 link. On 
the other hand, MT 2 could be served by two cooperating BSs (BS 2 and BS3) with an improved wireless 
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Fig. 4. Downlink cache-assisted wireless signal processing. 


channel gain from coordinated beamforming. In particular, the CU only needs to transmit the content 
requested by MT 2 to BS3 before the cooperative transmissions of the two BSs. Thanks to such wireless 
cooperation, MT 2 could achieve a higher data rate at 3 units per second. 

In a more general setting, caches could be located at not only the BSs, but also the routers and the CU. 
Furthermore, distributed caching could also be adopted at MTs to allow mobile users to serve popular 
contents requested by nearby peer users in a device-to-device (D2D) manner. We could foresee that 
cache-assisted resource allocation method becomes a key enabling factor of significant bandwidth saving, 
since frequent overlapping of requested objects will occur as the volume of mobile traffic increases. 
However, it also becomes a more challenging problem to optimize system-wide resource allocation due 
to the interleaving among cache placement, wireless interference, routing, and the combinatorial nature 
of node selections in the wireless network. A more comprehensive understanding on the design tradeoff 
remains open for future study. 

Another interesting topic on cache-assisted resource allocation is on cache provisioning for popular 
contents to reduce the real-time backhaul traffic. In particular, cache provisioning addresses the questions 
of what, where and when to cache in the wireless infrastructure. In this case, accurate knowledge of 
the mobile user demand profiles is a key to efficient cache provisioning. The extraction of user demand 
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profiles from mobile data traffie is performed by wireless bigdata analyties, which will be discussed in 
the next section. 


III. Developing a Bigdata Aware Wireless Network 

Instead of viewing mobile bigdata as a pure burden, we investigate in this section the potential 
performance gain from developing a bigdata-aware intelligent wireless network. However, its efficient 
operation relies on the in-depth knowledge of the wireless bigdata traffic characteristics. As most of such 
characteristics are implicit, we first introduce data-analytical methods necessary to extract these bigdata 
features. We then discuss how to leverage these bigdata characteristics in designing wireless networks to 
capitalize from the mobile bigdata traffic. 

A. Useful mobile bigdata features and applications 

There is clearly a strong connection between wireless service usage and human behavioral patterns in 
the physical world. For this reason, wireless data traffic contains strong correlative and statistical features 
in various dimensions, such as time, location and the underlying social relationship, etc. On one hand, 
mobile traffic has strong aggregate features. For instance, there exist severe load imbalances spatially and 
temporally, such that, presently, 10% of “popular” BSs carry about 50% ~ 60% traffic load. The peak 
traffic volume at a given location is much higher than the regular average. These aggregate features could 
be exploited to reduce real-time fronthaul/backhaul traffic and to improve wireless network efficiency. 
Example applications include: cell planning according to geographical data usage distribution, peak load 
shifting via load-dependent pricing, and cache provisioning based on aggregate demand profile, among 
others. 

On the other hand, each mobile user’s data usage profile also exhibits a unique set of individual features, 
such as mobility pattern, preference of various data applications, and service quality requirements. For 
instance, a mobile user’s trajectories often consist of a very limited number of frequent positions and quasi- 
repetitive patterns. Besides, the recent popularity of mobile social networking interconnects seemingly 
uncorrelated individual data usages into a unified social profile, thereby presenting a novel perspective to 
analyze the mobile traffic pattern. These individual and social features are useful for system operators to 
personalize and improve wireless service quality. Many intelligent data-aware services could be provided 
according to user profiles. Examples include resource reservation in handoff using location prediction, 
context-aware personal wireless service adaptation, and mobility-based routing and paging control. 
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B. Bigdata analytical tools 

The ability to acquire, analyze, and exploit mobile traffic characteristics can be accomplished by 
specially designed learning units (LUs) installed at both the BSs and CUs (see Fig.O. Their core enabling 
factors are the embedded data-analytical algorithms. Some commonly used algorithms for wireless traffic 
analysis and their main applications to wireless communications are classified as follows and summarized 
in Table I. 

1) Stochastic modeling: Stochastic modeling methods use probabilistic models to capture the explicit 
features and dynamics of the data traffic. Commonly used stochastic models include: order-Markov 
model, hidden Markov model, geometric model, time series, linear/nonlinear random dynamic systems, 
etc. For example, Markov models and Kalman filters are widely used to predict user mobility and service 
requirements |[8l. The collected user data are often used for parameter estimation of stochastic models, 
such as estimating the transition probability matrix of a Markov chain. 

2) Data mining: Data mining focuses on exploiting the implicit structures in the mobile data sets. Also 
taking the mobility prediction problem as an example, individual user’s mobility pattern could be extracted 
and discovered by finding the most frequent trajectory segments in the mobility log. Prediction could be 
made accordingly by matching the current trajectory to the mobility profile. Clustering is another useful 
technique to identify the different patterns in the data sets. It is widely used in context-aware mobile 
computing, where a mobile user’s context and behavioral information, such as sleeping and working, are 
identified from wireless sensing data for providing context-related services dH. 

3) Machine learning: The main objective of machine learning is to establish functional relationship 
between input data and output actions, thus achieving auto-processing capability for unseen patterns of 
data inputs. Among the many useful techniques in machine learning applied to wireless communications, 
classification (determining the type of input data) and regression analysis (data fitting) are two common 
methods, whose applications include context identification of mobile usage and prediction of traffic 
levels (classification), or fitting the distributions of trajectory length, mobile user location, and channel 
holding times (regression). Besides, reinforcement learning, such as Q-leaming ifTOll . is useful for taking 
proper real-time actions to maximize certain long-term rewards. A typical example is making the handoff 
and admission control decision (action), given the current traffic load (state) and incoming new requests 
(event), in which the reward could be evaluated against the reduction of dropped calls or failed connections. 

4) Large-scale data analytics: Wireless bigdata poses many challenges to the aforementioned conven¬ 
tional data-analytical methods due to its high volume, large dimensionality, uneven data qualities, and 
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Subjects 

Models/algorithms 

Example wireless applications 

Statistical modeling 

Markov models, time series, 
geometric models, Kalman filters 

mobility prediction, resource provision, 
device association/handoff prediction 

Data mining 

pattern matching, text compression, 

clustering, dimension reduction 

mobility prediction, social group clustering, 
context-aware processing, cache 
management, user profile management 

Machine learning 

classification algorithms, 
neural network, 
regression analysis, 

context identification, traffic prediction, 
fitting trajectory length, user location 
and the channel holding time 

dimension reduction algorithms: 
PCA, PARAFAC, TuckerS 

user data compression/storage, traffic 
feature extraction, blind multiuser detection 

Q-learning 

handoff and admission controls 

primal/dual decomposition, ADMM 

distributed routing/rate control 
and wireless resource allocation 

online convex optimization, 
stochastic learning 

on-line mobility predictions, handoffs, 
and resource provisioning 

active learning, deep learning 

incomplete/complex mobile data processing 


the complex features therein. To improve signal processing efficiency, one can combine the following 
complexity reduction techniques with the conventional data analytical tools for large-scale data processing. 

• Distributed optimization algorithms, such as primal/dual decomposition and alternating direction 
method of multipliers (ADMM), are very useful to decouple large-scale statistical learning problems 
into small subproblems for parallel computations so as to relieve both the computational burden at 
the CU and the bandwidth pressures to the fronthaul/backhaul links. 

• Dimension reduction methods are useful to reduce the data volume to be processed while capturing 
the key features of bigdata. Among various methods, principle component analysis (PCA), along 
its many variants, is the mostly used method today. In addition, tensor decomposition methods are 
also popular in mobile data processing, which seek to approximately represent a high-order multi¬ 
way array (tensor) as a linear combination of outer products of low-order tensors. By doing so, the 
hardware requirement and cost for storing the high-order arrays of mobile data could be reduced. 

• Other advanced learning methods could be used to handle incomplete or complex data sets. Inter¬ 
esting examples include active learning, which deals with partially labeled data set; online learning 
for responding in real-time to sequentially received data; stochastic learning that makes a decision 
periodically in each time interval; and deep learning for modeling complex behaviors contained in 
a data set. 
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Fig. 5. An illustrative structure of bigdata aware wireless network. 

C. Bigdata aware wireless network 

Once identified and extracted, data characteristics could be used to improve wireless serviee quality and 
generate new mobile applieations. For simplieity of illustration, we have postulated in Fig. [5] a structure 
of bigdata aware wireless network, eonsisting of several mutually complementary components that enable 
data-driven mobile serviees, whose funetionalities are deseribed below. 

• Data-aware cache management: For quick access under high traffic volumes, cached contents need 
to be earefully eategorized, eompaetly organized and timely updated. Many types of eontent objeets, 
sueh as musie and video files, are embedded with metadata labels that deseribe the properties of 
the contents, from which the data contents could be well elassified. By classifying data into a 
number of sub-elasses based on eontents, sueh as sports videos and news pietures, the LUs eould 
aehieve more aeeurate evaluation of the eontent popularity by jointly eonsidering its own aeeess 
eount and the total aeeess count of its type, which reflects the average frequeney of potential 
future aeeesses. Aeeordingly, popular eontents are eontinuously eaehed while unpopular eontents 
are removed regularly to maximize the effeetive system bandwidth given limited eaehe size. 

• Crowd computing: Mobile users of similar interests eould share their resources with peers in 
their vieinity, either with or without taking advantage of the wireless infrastrueture. For instanee, 
a eomplete 3D street view eould be generated by a BS from relevant photos eontributed by users 
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from different angles. Meanwhile, when MT-to-BS eonnection is unavailable, an MT eould ask for 
assistanee from its neighboring MTs to share available eontents and applieations, or to even aet as 
relays to the eellular network, ete. Sueh an idea is explored in IfTTI . where a erowd-enabled data 
transmission mechanism is proposed to let mobile users assist the data dissemination of other users. 
In particular, it makes use of personal social information and market incentives to enhance the “will¬ 
ingness” of mobile users for acting as a data broker of others such that higher chance of successful 
data delivery could be achieved. Essentially, this peer-to-peer nature of crowd computing exploits 
user mobility and spatial correlation of data traffics, which also helps us reduce the conventional 
cellular traffic to and from the wireless infrastructure. 

• Mobile cloud processing: Multiple interconnected C-RANs constitute a mobile cloud, which could 
optimize the wireless services based on knowledge with respect to mobile traffic patterns, especially 
when user mobility spans across different C-RAN clusters. For instance, based on the mobility 
pattern of an MT, a CU could reserve channel resource in advance and pre-feed the contents to the 
BSs along the anticipated MT’s route. As such, chunks of contents could be sent from different BSs 
to achieve seamless handoffs. Similarly, aggregate characteristic behavior of data traffic could also 
be used to allocate resources such as bandwidth and cache space to some popular locations ahead 
of some real-time events. This approach could evidently reduce connection time, delay jitter, and 
burden of real-time traffic bursts on both cellular fronthaul and backhaul. 

• Wireless cloudlet: The concept of cloudlets introduced in iTT^ defines a self-organized light cloud 
with limited storage and computing power installed at the BSs to enhance their local data processing 
capability. The deployment of cloudlets could effectively reduce the packet round-time delay by an 
order of magnitude. A cloudlet may be owned by the network operator but leased to commercial 
clients for improving performance of delay-sensitive applications, such as online gaming. Besides, a 
cloudlet could also allow commercial clients to access local cache to provide better location-based 
services. For instance, an advertising company could send to its subscribers in the vicinity the latest 
deals based on the information posted by local stores and queries made by prospective customers. 
With cloudlet, real-time traffic in the backhaul network could also be largely reduced, since many 
services could be provided locally instead of burdening the core network. 

• Context/social-aware processing: Context/social-aware computing is an emerging paradigm for 
exploiting complex data characteristics besides conventional user profiles such as mobility pattern and 
demand distribution [[T3ll . The idea of context-aware computing is to provide personalized services 
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adaptive to the MT’s real-time “eontext”, such as traveling, working, and recreation, either directly 
reported by the MT or inferred from various available data. Social computing, on the other hand, 
calls for wireless resource allocation to follow closely the interaction within and among social groups 
[fT3]| . Conceptually, a social group is a subset of users that share some similar interests, professions, 
hobbies, and life experiences, etc. In general, a social group has unique “eigenbehaviors”, such 
that the group members require and generate similar data contents. The knowledge of a social 
community’s composition, activities and interests could be used to improve the wireless services for 
the targeted social group members. 

• Software-defined-network (SDN): SDN replaces the conventional hardware-configured routing and 
forwarding devices by software programmable units. In particular, it decouples the user’s data plane 
(U-plane) from the control and management plane (C-plane), such that the network is managed 
by a central controller while the underlying devices are only responsible for simple functions 
such as packet forwarding. Such decoupling provides unprecedented flexibility to network traffic 
management, where packet forwarding decisions may now be programmed based on many new 
considerations such as QoS (quality of service) requirement, application types, and payload length, 
in addition to the conventional destination oriented and distance-based metrics. For SDN-enabled 
wireless networks, [fT4ll proposes a flow-based resource management framework in C-RAN, where the 
packet routing in the backhaul network and beamforming design in the wireless access network are 
jointly optimized based on individual data flow’s source-destination pair, wireless channel condition, 
backhaul link capacity, and user QoS requirements, etc. In the case of WLAN networks, ifTSll 
introduces a SDN-based enterprise WLANs framework named Odin, which is built with programable 
functions, global knowledge of network status, and direct control of network devices. The SDN-based 
system makes many difficult or costly tasks in conventional WLANs easier and less inexpensive, 
including seamless user handoffs, global load balancing, and hidden terminal problem mitigation. 

IV. Future Research Directions 

In the mobile bigdata era, wireless system designs contain rich research problems of important ap¬ 
plications and impact that are yet to be studied. Beyond the many research issues that arise among the 
number of topics we have discussed so far, here in this section, we highlight several interesting research 
topics that we particularly find exciting. 
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A. Reduced-complexity fronthaul processing 

In many data compression proposals, real-time calculation of the optimal eompression noise covariance 
matrix is often impeded by the large number of fronthaul eapaeity eonstraints and the non-eonvex nature 
of many fronthaul-constrained problems. The problem is further exacerbated by the difficulty in generating 
praetieal joint compression codebooks based on the obtained eovarianee matrix. Therefore, sub-optimal 
but praetieal eompression sehemes, sueh as sealar quantization, should be given more eonsideration in 
future study of fronthaul-eonstrained compression design. Similarly, CU-level encoding and decoding also 
suffers from high eomputational eomplexity on large-seale multi-user deteetion and the eombinatorial 
nature of many limited eooperation sehemes, sueh as optimal antenna, relay, modulation and eoding 
combinations, as well as BS seleetions. It therefore calls for practical complexity-reduction algorithms 
that are truly sealable to the number of mobile users and network entities. 

B. Cache-assisted wireless resource allocation 

BS-level caching is expected to play an important role in future wireless bigdata proeessing, due 
to its simplieity, low eost, and natural integration with bigdata analytieal tools. However, researeh on 
eaehe-assisted wireless resouree alloeation is still in its infaney. For eaehe-assisted eellular networks with 
BS-level eaehing, currently there is a shortage of both eonerete theoretical analysis on the eapaeity gain of 
cache-assisted proeessing and praetieal optimization frameworks for eaehe-assisted resouree allocation. 
Furthermore, effeetive and optimized integration of various identified bigdata eharaeteristies in eaehe- 
assisted network design is an interesting problem that awaits future investigations. 

C. Distributed network traffic control 

In large-seale wireless networks, distributed eontrol/eomputing algorithms eould be integrated to alle¬ 
viate eomputational eomplexity of the CU, to reduce baekhaul traffie volume and to mitigate the risk of 
single point failures without eompromising overall system performanee. Owing to the programmability 
of SDN-enabled system infrastrueture, distributed eontrol meehanisms eould be implemented with mueh 
better flexibility and lower eost. However, the feasibility and eomplexity reduetion of distributed algo¬ 
rithms are often constrained by the underlying problem strueture, sueh as the coupling constraints in the 
baekhaul and the partial knowledge of data traffie, ete. Distributed eontrol, or a mixed eentralized and 
deeentralized eontrol framework, is a promising working direetion towards a future wireless networking 
design supporting mobile bigdata. Additionally, the SDN-based design may also ineorporate distributed 
eaehing (at BSs and routers) to enhanee the effleieney of the routing deeision. 
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D. Mobile data security and privacy 

Harvesting over large mobile data sets and data analytics naturally give rise to concerns with respect to 
data security and privacy. In a cloud-based wireless network, large amount of data is stored in the 
fronthaul/backhaul network either for customers’ personal use or as commercial database for future 
analytical purposes. The system operators or commercial entities that collect the user data should be 
responsible for data security and privacy. For example, personal data should be only available for legitimate 
and authenticated users. Similarly, data integrity should be guaranteed such that no data is lost or modified 
by unauthorized entities. Furthermore, it is also important to maintain confidentiality of user data when 
they are either in storage or during processing. It is therefore important to develop secure yet efficient 
data processing and storage methods. Promising security measures may include privacy aware distributed 
data storage and decentralized processing, which aim to maintain local data confidentiality. 

V. Conclusions 

This article addresses challenges and opportunities that we face in the era of wireless big data. We 
first reviewed state-of-the-art signal processing methods and networking structures that may allow us to 
effectively manage and in fact take advantage of wireless bigdata traffic. We outlined the major obstacles 
of bigdata signal processing and network design with respect to the scale of problem size and the complex 
problem structures. Nevertheless, research on big data for wireless communications and networking is not 
only promising but also inevitable in light of the continuing data volume explosion. We also suggested 
several interesting research problems aimed at stimulating future wireless research innovations in the 
bigdata era. 
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